Things that should affect the matchup:
##
## Divergences:
## 0 of 4000 iterations ended with a divergence.
##
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 10.
##
## Energy:
## E-BFMI indicated no pathological behavior.
##
## Divergences:
## 0 of 4000 iterations ended with a divergence.
##
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 10.
##
## Energy:
## E-BFMI indicated no pathological behavior.
##
## Divergences:
## 0 of 4000 iterations ended with a divergence.
##
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 10.
##
## Energy:
## E-BFMI indicated no pathological behavior.
##
## Divergences:
## 0 of 4000 iterations ended with a divergence.
##
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 10.
##
## Energy:
## E-BFMI indicated no pathological behavior.
##
## Divergences:
## 0 of 4000 iterations ended with a divergence.
##
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 12.
##
## Energy:
## E-BFMI indicated no pathological behavior.
##
## Divergences:
## 0 of 4000 iterations ended with a divergence.
##
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 12.
##
## Energy:
## E-BFMI indicated no pathological behavior.
##
## Divergences:
## 0 of 4000 iterations ended with a divergence.
##
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 12.
##
## Energy:
## E-BFMI indicated no pathological behavior.
##
## Divergences:
## 0 of 4000 iterations ended with a divergence.
##
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 12.
##
## Energy:
## E-BFMI indicated no pathological behavior.
## $`Partial pooling on independent decks`
## NULL
##
## $`Partial pooling on starters and specs`
## NULL
##
## $`Partial pooling and interactions on starters and specs`
## NULL
##
## $`Partial pooling and full interaction between starters and specs`
## NULL
##
## $`Versus model on starters and specs, forum data only`
## NULL
##
## $`Versus model on starters and specs with Metal data`
## NULL
##
## $`Versus model on starters and specs with full Metal data`
## NULL
##
## $`Versus model on starters and specs with negative player skills, forum data only`
## NULL
See Prior choice.
In this version, we only consider players, decks as a whole rather than their componenets, and the turn effect on both.
We don’t bother showing the prior samples here, because they’re the same as before.
It’s hard, for me, to get an idea of how accurate the model is from these results, so let’s look at its predictions for some matches. Looking at the model’s post-hoc predictions for the matches used to fit it is a cheat, since we’re using the data twice, but it should give a rough idea of how good it is.
First, we’ll just list the predicted probability for the observed outcome of each match.
There aren’t many upsets here:
This can be seen more easily in the below density plot.
The post-hoc predictions are heavily lopsided towards unbalanced matchups. Well, these are post-hoc predictions, so we’d expect the predictions to lean towards being correct, to be some extent. What might be more helpful is to compare how often player 1 wins matches, compared to often the model thinks they should.
It looks like the post-hoc predictions currently aren’t lopsided enough! The actual outcomes are even more extreme than predicted.
We also give the average score for each model, for two types of proper scoring rule:
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
For reference, putting 50-50 odds on player 1 winning would give a logarithmic score of 0.6931472 and a Brier score of 0.25, regardless of true win rate. The scores have different signs, but in both cases better models have a score closer to zero. So we’re definitely doing better at predicting match results than just flipping a coin, and I don’t just have to hang up my statistician hat in shame. You can also see how much improvement we got just from switching to a “versus model”, where deck components are only evaluated relative to the opposing deck components.
And now for deck info (The DeGrey-ding deck is [Present]/Strength/Anarchy):
And now for deck info:
And now for deck info:
## Purple/Finesse White/Ninjitsu Red/Fire Black/Discipline
## -0.3103452423 -0.2319162543 -0.2226147568 -0.1691474648
## Neutral/Anarchy Purple/Balance Neutral/Growth Neutral/Discipline
## -0.1610282786 -0.1418095916 -0.1384370357 -0.1165525082
## White/Growth Red/Demonology Black/Law White/Truth
## -0.1123728707 -0.1085387434 -0.1082192625 -0.1044260592
## Black/Truth White/Fire Red/Disease Blue/Truth
## -0.1042230843 -0.1025983148 -0.1022270518 -0.0994462186
## Red/Truth Red/Blood Green/Feral Neutral/Feral
## -0.0963077918 -0.0927326251 -0.0898220735 -0.0893438132
## Red/Present Purple/Feral Black/Finesse Blue/Peace
## -0.0869380962 -0.0855959276 -0.0847593582 -0.0826820165
## Black/Future Neutral/Disease White/Necromancy Blue/Anarchy
## -0.0810185804 -0.0755938006 -0.0729893086 -0.0723147749
## Blue/Balance Green/Law White/Discipline White/Feral
## -0.0703661482 -0.0674767841 -0.0669843768 -0.0637144312
## Black/Strength Neutral/Balance Red/Finesse Green/Peace
## -0.0629020929 -0.0617714755 -0.0615365473 -0.0586553930
## Black/Present Neutral/Demonology Blue/Disease Blue/Discipline
## -0.0584456351 -0.0565806325 -0.0539330816 -0.0538362301
## Black/Growth Neutral/Law Green/Anarchy Black/Peace
## -0.0524527073 -0.0516126153 -0.0516050495 -0.0506812883
## Red/Past Purple/Anarchy Green/Blood Neutral/Present
## -0.0486322123 -0.0470434578 -0.0431687247 -0.0333807017
## Green/Future Neutral/Ninjitsu Purple/Future Green/Finesse
## -0.0286116176 -0.0276785712 -0.0268192102 -0.0221347698
## Green/Demonology Neutral/Fire Red/Discipline Blue/Past
## -0.0215432332 -0.0196479186 -0.0186855461 -0.0172415786
## Neutral/Finesse White/Balance White/Peace Purple/Law
## -0.0154442053 -0.0134172529 -0.0127068804 -0.0081410746
## Blue/Fire Blue/Feral Black/Feral Blue/Strength
## -0.0061764994 -0.0059306906 -0.0059244511 -0.0048559827
## Neutral/Past Green/Past White/Future Red/Future
## -0.0047435754 -0.0042487428 -0.0038975838 -0.0036932884
## Blue/Bashing Green/Fire Red/Ninjitsu Red/Peace
## -0.0033118002 -0.0030087429 -0.0027044074 -0.0027034816
## Blue/Ninjitsu Black/Ninjitsu White/Bashing Green/Discipline
## -0.0018385937 -0.0013090756 -0.0012413424 -0.0010294659
## Black/Past Blue/Necromancy Blue/Blood Blue/Present
## -0.0008021038 -0.0007309166 -0.0005411002 -0.0004122199
## Purple/Disease Purple/Ninjitsu Red/Bashing Black/Bashing
## -0.0001772368 0.0001461503 0.0002211268 0.0003178291
## Purple/Discipline Purple/Truth Purple/Bashing Neutral/Future
## 0.0006328003 0.0007684904 0.0010770861 0.0017715380
## Blue/Growth Red/Balance Blue/Demonology Purple/Past
## 0.0022658816 0.0024995449 0.0039317643 0.0039657235
## Blue/Future Purple/Fire Green/Growth Green/Ninjitsu
## 0.0045078487 0.0058554120 0.0062032863 0.0063856326
## Neutral/Bashing Red/Law Green/Bashing Blue/Law
## 0.0077008357 0.0078900241 0.0084118540 0.0124203907
## Purple/Growth Red/Feral Purple/Strength Neutral/Strength
## 0.0125238524 0.0150078694 0.0271956819 0.0303500917
## Black/Blood White/Demonology Red/Necromancy Neutral/Blood
## 0.0303688652 0.0330827185 0.0356370208 0.0384560799
## Purple/Present Green/Necromancy Green/Truth White/Past
## 0.0420310430 0.0425639696 0.0431436551 0.0440370243
## White/Disease White/Present Green/Strength Green/Disease
## 0.0451647433 0.0544240645 0.0565701468 0.0676941492
## Red/Anarchy Green/Balance Black/Balance Purple/Necromancy
## 0.0688197416 0.0715459729 0.0768072710 0.0867323367
## White/Anarchy Green/Present Black/Disease Purple/Peace
## 0.0886880348 0.0897687253 0.1033477643 0.1056341736
## White/Strength Neutral/Necromancy White/Blood Red/Growth
## 0.1084731250 0.1204313819 0.1211758152 0.1293758949
## Blue/Finesse Red/Strength Black/Demonology Purple/Demonology
## 0.1396179638 0.1480477716 0.1510609558 0.1933551883
## Neutral/Truth Purple/Blood Black/Necromancy Neutral/Peace
## 0.1941557007 0.2129897239 0.2280178147 0.2713143742
## White/Law Black/Fire Black/Anarchy White/Finesse
## 0.2740246064 0.3004659831 0.3432983694 0.4100709571
In retrospect, a deck’s strength will be heavily dependent on the opposing deck, so ranking deck strengths on a single scale is silly. So we now use models that evaluate deck parts only with respect to an opposing part.
## Warning in mapply(FUN = f, ..., SIMPLIFY = FALSE): longer argument not a
## multiple of length of shorter
## Warning: Removed 12144 rows containing non-finite values (stat_ydensity).
This is getting rather crowded! Here’s a version with only players that have been active in 2018 or 2019.
## Warning: Removed 6382 rows containing non-finite values (stat_ydensity).
## Warning: Removed 62 rows containing non-finite values (stat_ydensity).
## Warning: Removed 699 rows containing non-finite values (stat_ydensity).
This includes all the casual matches recorded by Metalize, in addition to the tournament matches.
Here are the base and negative-skill plots next to each other, for comparison: